Constructing an Annotated Corpus for Protest Event Mining
نویسندگان
چکیده
We present a corpus for protest event mining that combines token-level annotation with the event schema and ontology of entities and events from protest research in the social sciences. The dataset uses newswire reports from the English Gigaword corpus. The token-level annotation is inspired by annotation standards for event extraction, in particular that of the Automated Content Extraction 2005 corpus (Walker et al., 2006). Domain experts perform the entire annotation task. We report competitive intercoder agreement results.
منابع مشابه
Constructing Coherent Event Hierarchies from News Stories
News describe real-world events of varying granularity, and recognition of internal structure of events is important for automated reasoning over events. We propose an approach for constructing coherent event hierarchies from news by enforcing document-level coherence over pairwise decisions of spatiotemporal containment. Evaluation on a news corpus annotated with event hierarchies shows that e...
متن کاملLarge Scale Corpus Analysis and Recent Applications
Recent progress of corpus and machine learning-based natural language processing methodologies have made it possible to handle large scale corpus with a quite high accuracy. The speaker is now involved in a project for constructing a large scale contemporary Japanese balanced corpus, aiming at constructing automatic annotation tools on various levels of natural language analyses. I will first i...
متن کاملAutomatic Acquisition of Huge Training Data for Bio-Medical Named Entity Recognition
Named Entity Recognition (NER) is an important first step for BioNLP tasks, e.g., gene normalization and event extraction. Employing supervised machine learning techniques for achieving high performance recent NER systems require a manually annotated corpus in which every mention of the desired semantic types in a text is annotated. However, great amounts of human effort is necessary to build a...
متن کاملHarvesting Parallel News Streams to Generate Paraphrases of Event Relations
The distributional hypothesis, which states that words that occur in similar contexts tend to have similar meanings, has inspired several Web mining algorithms for paraphrasing semantically equivalent phrases. Unfortunately, these methods have several drawbacks, such as confusing synonyms with antonyms and causes with effects. This paper introduces three Temporal Correspondence Heuristics, that...
متن کاملConstructing a Temporal Relation Identification System of Chinese based on Dependency Structure Analysis
"Temporal information (Time)" has been a subject of study in many disciplines particularly in philosophy, physics, and is an important dimension of natural language processing. The temporal information includes temporal expressions, event and temporal relations. There are many researches dealing with the temporal expressions and event expressions. However, researches on temporal relation identi...
متن کامل